-
Notifications
You must be signed in to change notification settings - Fork 1
Udovenko_git_pain #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
def my_method(self): | ||
print("my_method called!") | ||
@staticmethod | ||
def write(data, output_name, indent=2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out_filename
could be a better parameter name thanoutput_name
.- It's a good idea to provide a default value for the
indent
param.
print("my_method called!") | ||
@staticmethod | ||
def write(data, output_name, indent=2): | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid imports that are near to the place where they are used. Contrary to the local variables (that should be defined closer to the exact place where they are used), all imports should be at the top of your module.
import json | ||
with open(output_name, 'w') as out_file: | ||
parsed = json.loads(json.dumps(data)) | ||
json.dump(parsed, out_file, indent=indent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just json.dump(data, out_file, indent=indent)
?
|
||
def __clr_tags_to_metadata(self, row): | ||
# task 1 and 2 for the tags array || and task 3 for the text | ||
self.sign_result = re.findall(self.sign_patt, row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remember, we're discussed that we shouldn't mutate the internal state in methods. So, avoid
self.something =
assignments. If you need this value later, in another method - this is a sequential dependency. It's a bad thing, don't use the internal state to communicate between methods. Use methods params for that. - The
self.sign_result
is not used in this method anymore, so, there's no practical need to calculate its value here, in this method. This line of code should be moved to the__remove_dollar_sign_words
method.
self.name = input_file_name | ||
|
||
# list of patterns for regular expressions | ||
self.tags_patt = re.compile(r",\[.*\]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good idea to pre-create all the data you need in your methods right here, in your __init__()
.
.replace("'", "'").replace(""", '"').replace('’', '\'') | ||
|
||
for sign in self.sign_result: | ||
self.clr_row = self.clr_row.replace(sign, '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, you use the internal state to store the value (self.clr_row
) that is required for the subsequent method calls. Just make it a local variable and return it after all required mutations.
for row in data_file: | ||
row_dict = dict() | ||
|
||
self.__clr_tags_to_metadata(row=row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code should have been implemented in the following way:
metadata, tags_result = self.__clr_tags_to_metadata(row) # don't be afraid of multiple return values, sometimes it's a good thing
clr_row = self.__remove_dollar_sign_words(row, tags_result)
body_tags = self.__place_tags_wrd_to_body_tags(clr_row)
orphan_tokens = self.__tokenize_add_orphan_tokens(clr_row)
row_dict["body"] = body # you could just get the `body` from `row` while reading the parsed file
row_dict["body_tags"] = body_tags
row_dict["metadata"] = metadata
row_dict["orphan_tokens"] = orphan_tokens
Put this way, your program is just a pipe that transforms the data, each piece of this pipe is not connected to any other pieces with anything except its params, return value(s) and (seldom) reading the internal state.
|
||
def __place_tags_wrd_to_body_tags(self, clr_row): | ||
# task 2 for the text | ||
self.body_tags = re.findall(self.body_patt, clr_row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing into the internal state - this should be avoided. Make it the return value of the method.
|
||
for token in tokens_clr: | ||
if lesk(tokens_clr, token) is None: | ||
self.orphan_tokens.append(token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing into the internal state - this should be avoided. Make it the return value of the method.
@@ -0,0 +1,101 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, this file is obsolete and can be deleted.
No description provided.